Geographical analysis of media flows

A multidimensional approach

Claude Grasland (Université de Paris (Diderot), FR 2007 CIST, UMR 8504 Géographie-cités)

featured

Introduction

1 A. Importation of RSS

1.1 The Mediacloud database

(tbd : presentation of the MediaCloud project)

Mediacloud can be freely used by researchers. All you have to do is to create an account at the following adress :

https://explorer.mediacloud.org

You have different ways to get title of news. We will focus here on a simple example of data obtained through the mediacloud interface. We suppose that you want to extract news from the Tunisian newspapers speaking from Europe.

1.2 Selection of media with source manager

We use the application called Source Manager and we introduce a research by collection which is the most convenient to explore what is available in a country. In our example, the target country is Tunisia and we have three collections that are propsed :

We have selected the collection named “Tunisia National” because we are interested in the most important newspapers of the country.

The buble graphic on the right indicates immediately the media that has produced the highest number of news, but it is wise to explore in more details the list on the left which indicates for each media the statting date of data collection.

When a media appears interesting, we click on its name to obtain a brief summary of the metadata. For example, in the case of L’économiste Maghrebin the metadata indicates :

The media looks promising, but before to go further, it can be better to have a look at the website of the media to have a more concrete idea of the content if we don’t know in advance what it is about in terms of content, what is the ideological orientation, etc.

Here we can see that this is an ecnomic journal, published in french, with news organized in concentric geographic circles (Nation > Maghreb > Africa > World) which is precisely what we are looking for in the IMAGEUN project. We will further complete the informations about this, but before to do that we have to check in more details if the production of the media is regular through time with another tool offered by mediacloud, the explorer.

1.3 Checking the stability through time

We have clicked on search in explorer on the metadata page of the Source Manager and obtain a news interfacce where we modify the date to cover the full period of collection of the media (or our period of interest). In the research field, we let the search term * which indicates a research on all news.

Below your request, you obtain a graphic entitled Attention Over Time with the distribution of the number of news published per day which help you to verify if the distribution of news is regular through time. You just have to modify the type of graphic in order to visualize Story Count and you can choose the time span you want (day, week or month) for the evaluation of the regularity of news flow. In our example, we notice that at daily level they are some brief period of break in 2019, but the flow is reasonnabely regular with approximatively 5 news per day at the beginning and 10 to 20 in the final period. We also notice a classical week cycle with a decrease of news published during the week-end.

Going down, you will find a news panel entitled Total Attention which gives you the total number of stories found. In our example, we have a total of 13626 stories produced by our media over the period.

1.5 Download and storage of news

According to your selection (all news or a specific topic) you will download more or less title. Here, me make the choice to get all news, which means that we have to repeat the original request with *.

Finally, by clicking on the button Download all story URLS, you can get a .csv file that you can easily load in your favorite programming language as we will see in the next section.

Bibliographie

BARNIER, Julien, 2021. rmdformats: HTML Output Formats and Templates for ’rmarkdown’ Documents [en ligne]. S.l. : s.n. Disponible à l'adresse : https://github.com/juba/rmdformats.
R CORE TEAM, 2020. R: A Language and Environment for Statistical Computing [en ligne]. Vienna, Austria : R Foundation for Statistical Computing. Disponible à l'adresse : https://www.R-project.org/.
XIE, Yihui, 2020. knitr: A General-Purpose Package for Dynamic Report Generation in R [en ligne]. S.l. : s.n. Disponible à l'adresse : https://CRAN.R-project.org/package=knitr.

Annexes

Infos session

setting value
version R version 4.1.0 (2021-05-18)
os Windows 10 x64
system x86_64, mingw32
ui RTerm
language (EN)
collate French_France.1252
ctype French_France.1252
tz Europe/Paris
date 2021-11-25
package ondiskversion source
dplyr 1.0.6 CRAN (R 4.1.0)
ggplot2 3.3.3 CRAN (R 4.1.0)
knitr 1.34 CRAN (R 4.1.1)
quanteda 3.0.0 CRAN (R 4.1.0)
rmarkdown 2.11 CRAN (R 4.1.1)
rzine 0.1.0 gitlab ()
tidytext 0.3.1 CRAN (R 4.1.1)

Citation

@Manual{ficheRzine,
    title = {Titre de la fiche},
    author = {{Auteur.e.s}},
    organization = {Rzine},
    year = {202x},
    url = {http://rzine.fr/},
  }


Glossaire